---
title: "NorthBridge Bank"
subtitle: "Retail Card Portfolio: Fraud Intelligence & Transaction Risk Report"
author: "Freda Erinmwingbovo, Data Analyst"
date: today
format:
html:
theme: [darkly, northbridge.scss]
toc: true
toc-depth: 3
toc-title: "Report Contents"
toc-location: left
number-sections: false
code-fold: true
code-tools: true
code-summary: "Show Query"
page-layout: full
smooth-scroll: true
self-contained: true
execute:
warning: false
message: false
cache: true
jupyter: python3
jupyter-options:
python: "C:/Users/engrf/AppData\Local/Programs/Python/Python313/python.exe"
---
<div class="classification-banner">CONFIDENTIAL. INTERNAL USE ONLY. NORTHBRIDGE BANK RISK INTELLIGENCE DIVISION</div>
# NorthBridge Bank: Retail Card Portfolio Risk Report
**Prepared by:** Freda Erinmwingbovo, Data Analyst
**Prepared for:** James Okonkwo, Chief Executive Officer, NorthBridge Bank
**Classification:** Confidential, Internal Use Only
**Date:** February 2026
**Dataset:** 1,296,675 transactions · 51 states · January 2019 to June 2020
---
## Executive Summary
NorthBridge Bank's retail card portfolio processed **1,296,675 transactions**
across 18 months of trading between January 2019 and June 2020. This report
delivers a structured intelligence assessment of transaction behaviour, fraud
patterns, and customer value across the portfolio.
Seven analytical questions were investigated:
| # | Business Question | Classification |
|---|------------------|----------------|
| 1 | Transaction volume and revenue by category | Portfolio Intelligence |
| 2 | Fraud rates by merchant and category | Fraud Risk |
| 3 | Fraud profile by gender and age group | Demographic Risk |
| 4 | Transaction amounts associated with fraud | Amount Risk |
| 5 | Geographic fraud concentration | Geographic Risk |
| 6 | Temporal fraud patterns | Behavioural Risk |
| 7 | Highest-value customer profiles | Customer Intelligence |
The overall portfolio fraud rate stands at **0.58%**, equivalent to
7,506 fraudulent transactions representing material financial exposure
that demands immediate operational response.
---
## Data Infrastructure
### Library Imports and Database Construction
```{python}
# ── LIBRARY IMPORTS ──
import pandas as pd
import numpy as np
import sqlite3
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.patches as mpatches
from matplotlib.gridspec import GridSpec
import warnings
warnings.filterwarnings('ignore')
print("All libraries imported successfully.")
print(f"pandas: {pd.__version__}")
print(f"numpy: {np.__version__}")
print(f"matplotlib: {matplotlib.__version__}")
print(f"sqlite3: {sqlite3.sqlite_version}")
```
### Data Loading and Cleaning
```{python}
# ── LOAD RAW DATA ──
df = pd.read_csv('credit_card_transactions.csv')
print(f"Raw dataset loaded: {df.shape[0]:,} rows × {df.shape[1]} columns")
# ── CLEANING ──
# Drop unnamed index column and merch_zipcode (195,973 missing)
df = df.drop(columns=['Unnamed: 0', 'merch_zipcode'])
# Parse datetime
df['trans_date_trans_time'] = pd.to_datetime(df['trans_date_trans_time'])
df['dob'] = pd.to_datetime(df['dob'])
# Derived columns
df['trans_date'] = df['trans_date_trans_time'].dt.date
df['trans_year'] = df['trans_date_trans_time'].dt.year
df['trans_month'] = df['trans_date_trans_time'].dt.month
df['trans_hour'] = df['trans_date_trans_time'].dt.hour
df['trans_dow'] = df['trans_date_trans_time'].dt.day_name()
df['YearMonth'] = df['trans_date_trans_time'].dt.strftime('%Y-%m')
# Age at time of transaction
df['age'] = ((df['trans_date_trans_time'] - df['dob']).dt.days / 365.25).astype(int)
# Age bands
def age_band(age):
if age < 25: return 'Under 25'
elif age < 35: return '25 to 34'
elif age < 45: return '35 to 44'
elif age < 55: return '45 to 54'
elif age < 65: return '55 to 64'
else: return '65 and over'
df['age_band'] = df['age'].apply(age_band)
# Amount bands
def amt_band(amt):
if amt < 10: return 'Under $10'
elif amt < 50: return '$10 to $50'
elif amt < 100: return '$50 to $100'
elif amt < 500: return '$100 to $500'
elif amt < 1000: return '$500 to $1000'
else: return 'Over $1000'
df['amt_band'] = df['amt'].apply(amt_band)
# Cardholder full name
df['cardholder'] = df['first'] + ' ' + df['last']
print(f"\nCleaned dataset: {df.shape[0]:,} rows × {df.shape[1]} columns")
print(f"Date range: {df['trans_date_trans_time'].min().date()} to "
f"{df['trans_date_trans_time'].max().date()}")
print(f"Columns: {list(df.columns)}")
```
### SQLite Database Construction
```{python}
# ── BUILD SQLITE DATABASE ──
conn = sqlite3.connect(':memory:')
df.to_sql('transactions', conn, if_exists='replace', index=False)
print("SQLite database constructed in memory.")
print(f"Table: transactions")
# Verify
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM transactions")
count = cursor.fetchone()[0]
print(f"Row count confirmed: {count:,}")
cursor.execute("PRAGMA table_info(transactions)")
cols = cursor.fetchall()
print(f"\nSchema — {len(cols)} columns:")
for col in cols:
print(f" {col[1]:<30} {col[2]}")
```
### Portfolio Overview
```{python}
overview = pd.read_sql_query("""
SELECT
COUNT(*) AS total_transactions,
COUNT(DISTINCT cc_num) AS unique_cards,
COUNT(DISTINCT merchant) AS unique_merchants,
COUNT(DISTINCT state) AS states_covered,
COUNT(DISTINCT category) AS categories,
ROUND(SUM(amt), 2) AS total_volume_usd,
ROUND(AVG(amt), 2) AS avg_transaction_usd,
ROUND(MIN(amt), 2) AS min_transaction_usd,
ROUND(MAX(amt), 2) AS max_transaction_usd,
SUM(is_fraud) AS total_fraud_count,
ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
ELSE 0 END), 2) AS total_fraud_value_usd,
ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4) AS fraud_rate_pct
FROM transactions
""", conn)
display(overview.T.rename(columns={0: 'Value'}))
```
---
# Analysis
::: {.panel-tabset}
## Transaction Intelligence
### Business Question
What is NorthBridge Bank's total transaction volume, revenue distribution,
and spending composition across all card categories?
### Methodology
All 1,296,675 transactions are aggregated by spending category.
Volume, total value, average transaction size, and fraud exposure
are computed for each category to establish the portfolio's
commercial baseline.
**Dataset:** Full transactions table — 1,296,675 records
### Category Volume and Revenue
```{python}
category_analysis = pd.read_sql_query("""
SELECT
category,
COUNT(*) AS transaction_count,
ROUND(SUM(amt), 2) AS total_revenue_usd,
ROUND(AVG(amt), 2) AS avg_transaction_usd,
ROUND(MIN(amt), 2) AS min_usd,
ROUND(MAX(amt), 2) AS max_usd,
SUM(is_fraud) AS fraud_count,
ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4) AS fraud_rate_pct,
ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
ELSE 0 END), 2) AS fraud_value_usd
FROM transactions
GROUP BY category
ORDER BY total_revenue_usd DESC
""", conn)
display(category_analysis)
```
### Transaction Intelligence Visualization
```{python}
fig, axes = plt.subplots(1, 2, figsize=(16, 7))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Transaction Volume and Revenue by Category',
color='#f0ede8', fontsize=13,
fontfamily='serif', fontweight='bold', y=1.01)
gold = '#c9a84c'
muted = '#7a7a8a'
white = '#f0ede8'
surface = '#18181c'
border = '#2a2a30'
cats = category_analysis['category'].tolist()
revenue = category_analysis['total_revenue_usd'].tolist()
counts = category_analysis['transaction_count'].tolist()
bar_colors = [gold if i == 0 else '#3a3a42' for i in range(len(cats))]
# Plot 1: Revenue by category
ax1 = axes[0]
ax1.set_facecolor(surface)
bars = ax1.barh(range(len(cats)), revenue,
color=bar_colors, edgecolor='none', height=0.65)
ax1.set_yticks(range(len(cats)))
ax1.set_yticklabels(cats, color=white, fontsize=9,
fontfamily='monospace')
ax1.set_xlabel('Total Revenue (USD)', color=muted, fontsize=9)
ax1.set_title('Total Revenue by Category',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
ax1.invert_yaxis()
ax1.xaxis.set_major_formatter(
mticker.FuncFormatter(lambda x, _: f'${x/1e6:.1f}M')
)
for bar, val in zip(bars, revenue):
ax1.text(bar.get_width() + 50000,
bar.get_y() + bar.get_height()/2,
f'${val/1e6:.1f}M',
va='center', color=white, fontsize=7.5,
fontfamily='monospace')
# Plot 2: Transaction count by category
ax2 = axes[1]
ax2.set_facecolor(surface)
bars2 = ax2.barh(range(len(cats)), counts,
color='#3a3a42', edgecolor='none', height=0.65)
bars2[0].set_color(gold)
ax2.set_yticks(range(len(cats)))
ax2.set_yticklabels(cats, color=white, fontsize=9,
fontfamily='monospace')
ax2.set_xlabel('Number of Transactions', color=muted, fontsize=9)
ax2.set_title('Transaction Count by Category',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
ax2.invert_yaxis()
ax2.xaxis.set_major_formatter(
mticker.FuncFormatter(lambda x, _: f'{x/1000:.0f}K')
)
plt.tight_layout()
plt.show()
```
### Transaction Intelligence Findings
Gas and transport leads all categories in transaction volume with
131,659 transactions. The portfolio's revenue is broadly distributed
across 14 spending categories, reflecting a diversified cardholder base.
No single category represents a dangerous concentration of revenue exposure.
**Key observations:**
- Gas and transport dominates by transaction count, consistent with
high-frequency, lower-value daily spending behaviour
- Entertainment, personal care, and food and dining represent
lifestyle spend with higher average transaction values
- The portfolio's broad category distribution reduces concentration
risk at the revenue level
---
## Fraud Risk by Category and Merchant
### Business Question
Which spending categories and individual merchants carry the highest
fraud rates, and what is the total financial exposure?
### Methodology
Fraud rate is calculated as fraudulent transactions as a percentage
of total transactions per category and per merchant. Only merchants
with more than 100 transactions are included to ensure statistical
reliability. Results are ranked by fraud rate and total fraud value.
### Fraud by Category
```{python}
fraud_category = pd.read_sql_query("""
SELECT
category,
COUNT(*) AS total_transactions,
SUM(is_fraud) AS fraud_count,
ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4) AS fraud_rate_pct,
ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
ELSE 0 END), 2) AS fraud_value_usd,
ROUND(AVG(CASE WHEN is_fraud=1 THEN amt END), 2) AS avg_fraud_amt_usd
FROM transactions
GROUP BY category
ORDER BY fraud_rate_pct DESC
""", conn)
display(fraud_category)
```
### Top 20 Highest-Risk Merchants
```{python}
fraud_merchant = pd.read_sql_query("""
SELECT
merchant,
COUNT(*) AS total_transactions,
SUM(is_fraud) AS fraud_count,
ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4) AS fraud_rate_pct,
ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
ELSE 0 END), 2) AS fraud_value_usd
FROM transactions
GROUP BY merchant
HAVING COUNT(*) > 100
ORDER BY fraud_rate_pct DESC
LIMIT 20
""", conn)
display(fraud_merchant)
```
### Fraud Risk Visualization
```{python}
fig, axes = plt.subplots(1, 2, figsize=(16, 7))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Fraud Risk by Category and Merchant',
color='#f0ede8', fontsize=13,
fontfamily='serif', fontweight='bold', y=1.01)
red = '#c0392b'
gold = '#c9a84c'
# Plot 1: Fraud rate by category
fc = fraud_category.copy()
ax1 = axes[0]
ax1.set_facecolor(surface)
colors1 = [red if r > 1.0 else '#3a3a42'
for r in fc['fraud_rate_pct']]
bars = ax1.barh(range(len(fc)), fc['fraud_rate_pct'],
color=colors1, edgecolor='none', height=0.65)
ax1.set_yticks(range(len(fc)))
ax1.set_yticklabels(fc['category'], color=white,
fontsize=9, fontfamily='monospace')
ax1.set_xlabel('Fraud Rate (%)', color=muted, fontsize=9)
ax1.set_title('Fraud Rate by Category',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
ax1.invert_yaxis()
for bar, val in zip(bars, fc['fraud_rate_pct']):
ax1.text(bar.get_width() + 0.01,
bar.get_y() + bar.get_height()/2,
f'{val:.2f}%',
va='center', color=white, fontsize=7.5,
fontfamily='monospace')
# Plot 2: Fraud value by category
ax2 = axes[1]
ax2.set_facecolor(surface)
fc_sorted = fc.sort_values('fraud_value_usd', ascending=True)
bars2 = ax2.barh(range(len(fc_sorted)),
fc_sorted['fraud_value_usd'],
color=gold, edgecolor='none', height=0.65)
ax2.set_yticks(range(len(fc_sorted)))
ax2.set_yticklabels(fc_sorted['category'], color=white,
fontsize=9, fontfamily='monospace')
ax2.set_xlabel('Total Fraud Value (USD)', color=muted, fontsize=9)
ax2.set_title('Total Fraud Value by Category',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
ax2.xaxis.set_major_formatter(
mticker.FuncFormatter(lambda x, _: f'${x/1000:.0f}K')
)
plt.tight_layout()
plt.show()
```
### Fraud Risk Findings and Recommendations
Categories with elevated fraud rates represent priority targets
for transaction monitoring rule deployment.
**Recommendations:**
- Implement real-time merchant-level fraud scoring for the
top 20 high-risk merchants identified above
- Apply step-up authentication for transactions in
high-fraud-rate categories above 1.0%
- Review merchant agreements for flagged merchants and
conduct due diligence on their fraud prevention controls
---
## Demographic Risk Profile
### Business Question
How does fraud exposure vary by cardholder gender and age group?
Are certain demographic segments materially more exposed?
### Methodology
Fraud rate and fraud value are computed by gender and by age band.
Age bands are constructed in 10-year cohorts from the cardholder
date of birth relative to the transaction date.
### Fraud by Gender
```{python}
fraud_gender = pd.read_sql_query("""
SELECT
gender,
COUNT(*) AS total_transactions,
SUM(is_fraud) AS fraud_count,
ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4) AS fraud_rate_pct,
ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
ELSE 0 END), 2) AS fraud_value_usd,
ROUND(AVG(amt), 2) AS avg_transaction_usd
FROM transactions
GROUP BY gender
ORDER BY fraud_rate_pct DESC
""", conn)
display(fraud_gender)
```
### Fraud by Age Group
```{python}
fraud_age = pd.read_sql_query("""
SELECT
age_band,
COUNT(*) AS total_transactions,
SUM(is_fraud) AS fraud_count,
ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4) AS fraud_rate_pct,
ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
ELSE 0 END), 2) AS fraud_value_usd,
ROUND(AVG(CASE WHEN is_fraud=1 THEN amt END), 2) AS avg_fraud_amt_usd
FROM transactions
GROUP BY age_band
ORDER BY fraud_rate_pct DESC
""", conn)
display(fraud_age)
```
### Demographic Risk Visualization
```{python}
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Fraud Risk by Gender and Age Group',
color='#f0ede8', fontsize=13,
fontfamily='serif', fontweight='bold', y=1.01)
# Plot 1: Gender fraud rate
ax1 = axes[0]
ax1.set_facecolor(surface)
genders = fraud_gender['gender'].tolist()
g_rates = fraud_gender['fraud_rate_pct'].tolist()
g_colors = [red, gold]
bars = ax1.bar(genders, g_rates, color=g_colors,
edgecolor='none', width=0.5)
ax1.set_ylabel('Fraud Rate (%)', color=muted)
ax1.set_title('Fraud Rate by Gender',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.set_facecolor(surface)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='y')
for bar, val in zip(bars, g_rates):
ax1.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.005,
f'{val:.3f}%',
ha='center', color=white, fontsize=10,
fontfamily='monospace')
# Plot 2: Age band fraud rate
ax2 = axes[1]
ax2.set_facecolor(surface)
age_order = ['Under 25', '25 to 34', '35 to 44',
'45 to 54', '55 to 64', '65 and over']
fraud_age_sorted = fraud_age.set_index('age_band').reindex(age_order).reset_index()
a_colors = [red if r == fraud_age_sorted['fraud_rate_pct'].max()
else '#3a3a42'
for r in fraud_age_sorted['fraud_rate_pct']]
bars2 = ax2.bar(range(len(age_order)),
fraud_age_sorted['fraud_rate_pct'],
color=a_colors, edgecolor='none', width=0.6)
ax2.set_xticks(range(len(age_order)))
ax2.set_xticklabels(age_order, rotation=30, ha='right',
color=white, fontsize=8,
fontfamily='monospace')
ax2.set_ylabel('Fraud Rate (%)', color=muted)
ax2.set_title('Fraud Rate by Age Group',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='y')
for bar, val in zip(bars2, fraud_age_sorted['fraud_rate_pct']):
ax2.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.005,
f'{val:.3f}%',
ha='center', color=white, fontsize=8,
fontfamily='monospace')
plt.tight_layout()
plt.show()
```
### Demographic Risk Findings and Recommendations
**Recommendations:**
- Target enhanced fraud monitoring on the highest-risk
age cohort identified above
- Consider age-based step-up authentication thresholds
for transactions above the cohort average amount
- Review gender-based fraud patterns for potential
card targeting schemes
---
## Amount Risk Analysis
### Business Question
What transaction amount ranges are most associated with fraudulent
activity? Is there a statistically significant amount profile
for fraudulent transactions?
### Methodology
Transactions are grouped into six amount bands. Fraud rate
and average fraud amount are computed per band. A direct
comparison of average fraudulent versus legitimate transaction
amounts is also produced.
### Fraud Rate by Amount Band
```{python}
fraud_amount = pd.read_sql_query("""
SELECT
amt_band,
COUNT(*) AS total_transactions,
SUM(is_fraud) AS fraud_count,
ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4) AS fraud_rate_pct,
ROUND(AVG(amt), 2) AS avg_amt_usd,
ROUND(AVG(CASE WHEN is_fraud=1 THEN amt END), 2) AS avg_fraud_amt_usd,
ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
ELSE 0 END), 2) AS total_fraud_value_usd
FROM transactions
GROUP BY amt_band
ORDER BY avg_amt_usd
""", conn)
display(fraud_amount)
```
### Legitimate vs Fraudulent Transaction Amounts
```{python}
amt_comparison = pd.read_sql_query("""
SELECT
CASE WHEN is_fraud = 1 THEN 'Fraudulent' ELSE 'Legitimate' END AS transaction_type,
COUNT(*) AS transaction_count,
ROUND(AVG(amt), 2) AS avg_amount_usd,
ROUND(MIN(amt), 2) AS min_amount_usd,
ROUND(MAX(amt), 2) AS max_amount_usd,
ROUND(SUM(amt), 2) AS total_value_usd
FROM transactions
GROUP BY is_fraud
ORDER BY is_fraud DESC
""", conn)
display(amt_comparison)
```
### Amount Risk Visualization
```{python}
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Transaction Amount Risk Profile',
color='#f0ede8', fontsize=13,
fontfamily='serif', fontweight='bold', y=1.01)
amt_order = ['Under $10', '$10 to $50', '$50 to $100',
'$100 to $500', '$500 to $1000', 'Over $1000']
fraud_amt_sorted = fraud_amount.set_index('amt_band').reindex(amt_order).reset_index()
# Plot 1: Fraud rate by amount band
ax1 = axes[0]
ax1.set_facecolor(surface)
a_colors2 = [red if r == fraud_amt_sorted['fraud_rate_pct'].max()
else '#3a3a42'
for r in fraud_amt_sorted['fraud_rate_pct']]
bars = ax1.bar(range(len(amt_order)),
fraud_amt_sorted['fraud_rate_pct'],
color=a_colors2, edgecolor='none', width=0.65)
ax1.set_xticks(range(len(amt_order)))
ax1.set_xticklabels(amt_order, rotation=30, ha='right',
color=white, fontsize=8, fontfamily='monospace')
ax1.set_ylabel('Fraud Rate (%)', color=muted)
ax1.set_title('Fraud Rate by Amount Band',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='y')
for bar, val in zip(bars, fraud_amt_sorted['fraud_rate_pct']):
ax1.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.01,
f'{val:.2f}%',
ha='center', color=white, fontsize=8,
fontfamily='monospace')
# Plot 2: Avg amount — fraud vs legitimate
ax2 = axes[1]
ax2.set_facecolor(surface)
types = amt_comparison['transaction_type'].tolist()
avgs = amt_comparison['avg_amount_usd'].tolist()
colors3 = [red, gold]
bars2 = ax2.bar(types, avgs, color=colors3,
edgecolor='none', width=0.5)
ax2.set_ylabel('Average Transaction Amount (USD)', color=muted)
ax2.set_title('Average Amount: Fraud vs Legitimate',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='y')
for bar, val in zip(bars2, avgs):
ax2.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 1,
f'${val:,.2f}',
ha='center', color=white, fontsize=11,
fontfamily='monospace', fontweight='bold')
plt.tight_layout()
plt.show()
```
### Amount Risk Findings and Recommendations
**Recommendations:**
- Implement amount-based fraud scoring triggers at
the thresholds identified by the highest-risk amount bands
- Flag transactions that significantly deviate from a
cardholder's historical average transaction amount
- Apply mandatory secondary authentication for
transactions in the Over $1,000 band
---
## Geographic Risk
### Business Question
Which US states and cities carry the highest concentration of
fraudulent activity by count and by total value?
### Methodology
Fraud count, fraud value, and fraud rate are aggregated by state
and by city. States with fewer than 1,000 transactions are excluded
to ensure statistical reliability. Top 15 results by fraud value
are presented for both geographies.
### Fraud by State
```{python}
fraud_state = pd.read_sql_query("""
SELECT
state,
COUNT(*) AS total_transactions,
SUM(is_fraud) AS fraud_count,
ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4) AS fraud_rate_pct,
ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
ELSE 0 END), 2) AS fraud_value_usd
FROM transactions
GROUP BY state
HAVING COUNT(*) >= 1000
ORDER BY fraud_value_usd DESC
LIMIT 15
""", conn)
display(fraud_state)
```
### Fraud by City
```{python}
fraud_city = pd.read_sql_query("""
SELECT
city,
state,
COUNT(*) AS total_transactions,
SUM(is_fraud) AS fraud_count,
ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4) AS fraud_rate_pct,
ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
ELSE 0 END), 2) AS fraud_value_usd
FROM transactions
GROUP BY city, state
HAVING COUNT(*) >= 100
ORDER BY fraud_value_usd DESC
LIMIT 15
""", conn)
display(fraud_city)
```
### Geographic Risk Visualization
```{python}
fig, axes = plt.subplots(1, 2, figsize=(16, 7))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Geographic Fraud Concentration',
color='#f0ede8', fontsize=13,
fontfamily='serif', fontweight='bold', y=1.01)
# Plot 1: Top states by fraud value
ax1 = axes[0]
ax1.set_facecolor(surface)
state_colors = [red if i == 0 else '#3a3a42'
for i in range(len(fraud_state))]
bars = ax1.barh(range(len(fraud_state)),
fraud_state['fraud_value_usd'],
color=state_colors, edgecolor='none', height=0.65)
ax1.set_yticks(range(len(fraud_state)))
ax1.set_yticklabels(fraud_state['state'], color=white,
fontsize=9, fontfamily='monospace')
ax1.set_xlabel('Total Fraud Value (USD)', color=muted)
ax1.set_title('Top 15 States by Fraud Value',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
ax1.invert_yaxis()
ax1.xaxis.set_major_formatter(
mticker.FuncFormatter(lambda x, _: f'${x/1000:.0f}K')
)
# Plot 2: Top states by fraud rate
fraud_state_rate = fraud_state.sort_values(
'fraud_rate_pct', ascending=True
)
ax2 = axes[1]
ax2.set_facecolor(surface)
rate_colors = [red if i == len(fraud_state_rate)-1
else '#3a3a42'
for i in range(len(fraud_state_rate))]
bars2 = ax2.barh(range(len(fraud_state_rate)),
fraud_state_rate['fraud_rate_pct'],
color=rate_colors, edgecolor='none', height=0.65)
ax2.set_yticks(range(len(fraud_state_rate)))
ax2.set_yticklabels(fraud_state_rate['state'], color=white,
fontsize=9, fontfamily='monospace')
ax2.set_xlabel('Fraud Rate (%)', color=muted)
ax2.set_title('Top 15 States by Fraud Rate',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
plt.tight_layout()
plt.show()
```
### Geographic Risk Findings and Recommendations
**Recommendations:**
- Implement state-level fraud velocity rules for the
highest-risk jurisdictions identified above
- Apply enhanced monitoring to cardholders who transact
frequently in high-risk states outside their home state
- Coordinate with partner banks in the highest-risk
states to share fraud intelligence
---
## Temporal Fraud Patterns
### Business Question
When does fraud occur? Are there peak hours, days of the week,
or months where fraudulent activity concentrates?
### Methodology
Fraud count and fraud rate are aggregated by hour of day,
day of week, and month. This temporal profile enables
targeted deployment of real-time monitoring rules
during peak fraud windows.
### Fraud by Hour of Day
```{python}
fraud_hour = pd.read_sql_query("""
SELECT
trans_hour AS hour_of_day,
COUNT(*) AS total_transactions,
SUM(is_fraud) AS fraud_count,
ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4) AS fraud_rate_pct
FROM transactions
GROUP BY trans_hour
ORDER BY trans_hour
""", conn)
display(fraud_hour)
```
### Fraud by Day of Week
```{python}
fraud_dow = pd.read_sql_query("""
SELECT
trans_dow AS day_of_week,
COUNT(*) AS total_transactions,
SUM(is_fraud) AS fraud_count,
ROUND(100.0 * SUM(is_fraud) / COUNT(*), 4) AS fraud_rate_pct,
ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
ELSE 0 END), 2) AS fraud_value_usd
FROM transactions
GROUP BY trans_dow
ORDER BY fraud_rate_pct DESC
""", conn)
display(fraud_dow)
```
### Temporal Risk Visualization
```{python}
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Temporal Fraud Patterns',
color='#f0ede8', fontsize=13,
fontfamily='serif', fontweight='bold', y=1.01)
# Plot 1: Fraud rate by hour
ax1 = axes[0]
ax1.set_facecolor(surface)
hours = fraud_hour['hour_of_day'].tolist()
rates = fraud_hour['fraud_rate_pct'].tolist()
h_colors = [red if r == max(rates) else '#3a3a42' for r in rates]
bars = ax1.bar(hours, rates, color=h_colors,
edgecolor='none', width=0.8)
ax1.set_xlabel('Hour of Day', color=muted)
ax1.set_ylabel('Fraud Rate (%)', color=muted)
ax1.set_title('Fraud Rate by Hour of Day',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='y')
ax1.set_xticks(range(0, 24, 2))
# Overlay legitimate transaction volume
ax1b = ax1.twinx()
ax1b.plot(hours, fraud_hour['total_transactions'],
color=gold, linewidth=1.5, alpha=0.6,
linestyle='--', label='Transaction Volume')
ax1b.set_ylabel('Transaction Volume', color=gold, fontsize=8)
ax1b.tick_params(colors=gold)
ax1b.spines[:].set_color(border)
# Plot 2: Fraud rate by day of week
ax2 = axes[1]
ax2.set_facecolor(surface)
dow_order = ['Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday', 'Sunday']
fraud_dow_sorted = fraud_dow.set_index('day_of_week').reindex(
dow_order
).reset_index()
d_colors = [red if r == fraud_dow_sorted['fraud_rate_pct'].max()
else '#3a3a42'
for r in fraud_dow_sorted['fraud_rate_pct']]
bars2 = ax2.bar(range(len(dow_order)),
fraud_dow_sorted['fraud_rate_pct'],
color=d_colors, edgecolor='none', width=0.65)
ax2.set_xticks(range(len(dow_order)))
ax2.set_xticklabels([d[:3] for d in dow_order],
color=white, fontsize=9,
fontfamily='monospace')
ax2.set_ylabel('Fraud Rate (%)', color=muted)
ax2.set_title('Fraud Rate by Day of Week',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='y')
for bar, val in zip(bars2, fraud_dow_sorted['fraud_rate_pct']):
ax2.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.005,
f'{val:.3f}%',
ha='center', color=white, fontsize=8,
fontfamily='monospace')
plt.tight_layout()
plt.show()
```
### Temporal Risk Findings and Recommendations
**Recommendations:**
- Deploy heightened real-time fraud rules during peak
fraud hours identified above
- Increase transaction monitoring staff coverage
during highest-risk days of the week
- Review overnight transaction approval thresholds
for categories with elevated late-night fraud rates
---
## Customer Intelligence
### Business Question
Who are NorthBridge Bank's top 20 highest-value cardholders,
and what does their transaction profile reveal about
spending behaviour and fraud exposure?
### Methodology
Cardholders are ranked by total legitimate transaction value.
For each top customer, transaction count, average transaction
amount, categories used, and fraud exposure are computed.
### Top 20 Highest-Value Customers
```{python}
top_customers = pd.read_sql_query("""
SELECT
cardholder,
gender,
age_band,
state,
job,
COUNT(*) AS total_transactions,
ROUND(SUM(amt), 2) AS total_spend_usd,
ROUND(AVG(amt), 2) AS avg_transaction_usd,
ROUND(MAX(amt), 2) AS max_transaction_usd,
SUM(is_fraud) AS fraud_incidents,
ROUND(SUM(CASE WHEN is_fraud=1 THEN amt
ELSE 0 END), 2) AS fraud_exposure_usd
FROM transactions
GROUP BY cardholder, gender, age_band, state, job
ORDER BY total_spend_usd DESC
LIMIT 20
""", conn)
display(top_customers)
```
### Top Customer Spending Categories
```{python}
top_names = top_customers['cardholder'].tolist()
placeholders = ','.join(['?' for _ in top_names])
top_cat_query = f"""
SELECT
cardholder,
category,
COUNT(*) AS transactions,
ROUND(SUM(amt), 2) AS total_spend_usd
FROM transactions
WHERE cardholder IN ({placeholders})
GROUP BY cardholder, category
ORDER BY cardholder, total_spend_usd DESC
"""
top_cat = pd.read_sql_query(top_cat_query, conn, params=top_names)
display(top_cat.head(40))
```
### Customer Intelligence Visualization
```{python}
fig, axes = plt.subplots(1, 2, figsize=(16, 7))
fig.patch.set_facecolor('#0a0a0b')
fig.suptitle('Figure: Top 20 Customer Intelligence',
color='#f0ede8', fontsize=13,
fontfamily='serif', fontweight='bold', y=1.01)
# Plot 1: Top 20 by total spend
ax1 = axes[0]
ax1.set_facecolor(surface)
top10 = top_customers.head(10)
bar_colors = [gold if i == 0 else '#3a3a42'
for i in range(len(top10))]
bars = ax1.barh(range(len(top10)),
top10['total_spend_usd'],
color=bar_colors, edgecolor='none', height=0.65)
ax1.set_yticks(range(len(top10)))
ax1.set_yticklabels(top10['cardholder'], color=white,
fontsize=9, fontfamily='monospace')
ax1.set_xlabel('Total Spend (USD)', color=muted)
ax1.set_title('Top 10 Customers by Total Spend',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax1.tick_params(colors=muted)
ax1.spines[:].set_color(border)
ax1.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
ax1.invert_yaxis()
ax1.xaxis.set_major_formatter(
mticker.FuncFormatter(lambda x, _: f'${x:,.0f}')
)
# Plot 2: Fraud exposure for top 20
ax2 = axes[1]
ax2.set_facecolor(surface)
fraud_exposed = top_customers[
top_customers['fraud_incidents'] > 0
].sort_values('fraud_exposure_usd', ascending=True)
if len(fraud_exposed) > 0:
bars2 = ax2.barh(range(len(fraud_exposed)),
fraud_exposed['fraud_exposure_usd'],
color=red, edgecolor='none', height=0.65)
ax2.set_yticks(range(len(fraud_exposed)))
ax2.set_yticklabels(fraud_exposed['cardholder'],
color=white, fontsize=9,
fontfamily='monospace')
ax2.set_xlabel('Fraud Exposure (USD)', color=muted)
ax2.xaxis.set_major_formatter(
mticker.FuncFormatter(lambda x, _: f'${x:,.0f}')
)
else:
ax2.text(0.5, 0.5, 'No fraud incidents\namong top 20 customers',
ha='center', va='center', color=gold,
fontsize=12, fontfamily='monospace',
transform=ax2.transAxes)
ax2.set_title('Fraud Exposure — Top Customers',
color=white, fontweight='bold', pad=12,
fontfamily='serif')
ax2.tick_params(colors=muted)
ax2.spines[:].set_color(border)
ax2.grid(color=border, linestyle='--', linewidth=0.5, axis='x')
plt.tight_layout()
plt.show()
```
### Customer Intelligence Findings and Recommendations
**Recommendations:**
- Assign dedicated relationship management to the
top 20 cardholders to protect portfolio revenue
- Implement white-glove fraud alert protocols for
top-tier cardholders — phone calls rather than
automated blocks where possible
- Review any fraud incidents among top customers
immediately for potential reimbursement and
relationship preservation
:::
---
# Conclusions and Regulatory Summary
## Summary of Findings
| Theme | Key Finding | Risk Classification |
|-------|-------------|-------------------|
| Transaction Intelligence | Portfolio processes $91.2M across 14 categories | Baseline |
| Fraud by Category | Shopping and miscellaneous categories carry elevated fraud rates | High |
| Demographic Risk | Specific age cohorts show materially higher fraud exposure | Medium |
| Amount Risk | Fraudulent transactions have significantly higher average values | High |
| Geographic Risk | Fraud is geographically concentrated in identifiable states | High |
| Temporal Patterns | Fraud peaks at specific hours and days — enabling targeted controls | Medium |
| Customer Intelligence | Top 20 customers represent critical portfolio revenue concentration | High |
## The Three Most Urgent Risk Actions
**1. Deploy Category and Merchant-Level Monitoring Rules**
The fraud rate varies significantly across categories.
Real-time rules targeting high-risk categories must be
deployed within the current quarter.
**2. Implement Amount-Based Step-Up Authentication**
Fraudulent transactions carry a materially higher average
value than legitimate transactions. A tiered authentication
framework keyed to transaction amount would intercept a
disproportionate share of fraud value.
**3. Protect the Top 20 Customers**
The portfolio's highest-value cardholders represent
concentrated revenue risk. A single unresolved fraud
incident with a top customer carries relationship
and reputational costs that far exceed the financial loss.
## Data Quality and Analytical Constraints
- The fraud rate of 0.58% reflects a realistic but highly
imbalanced dataset, predictive modelling on this data
would require resampling techniques not applied here
- merch_zipcode was excluded due to 195,973 missing values (15.1%)
- Geographic analysis is limited to cardholder state,
merchant state data would enable cross-state fraud pattern detection
- The dataset covers January 2019 to June 2020 only,
seasonal patterns cannot be fully validated without additional years
## Final Note to James Okonkwo
The portfolio data is unambiguous. NorthBridge Bank does not
have a fraud volume problem, 0.58% is within industry norms.
What it has is a fraud concentration problem. The fraud that
exists is concentrated in identifiable categories, specific
amount ranges, predictable time windows, and geographic clusters.
That concentration is not a vulnerability, it is an opportunity.
Every concentration identified in this report is a point where
a targeted control will have maximum effect. The intelligence
is here. The decisions are yours.
---
**Report prepared by:** Freda Erinmwingbovo, Data Analyst
**Prepared for:** James Okonkwo, CEO, NorthBridge Bank
**Date:** February 2026
**Tools:** Python · pandas · SQLite · matplotlib · Quarto
**Dataset:** Credit Card Transactions Dataset · 1,296,675 records · 51 states · 2019–2020
**Classification:** Confidential, Internal Use Only
*This report was produced to professional data science standards.
All findings are reproducible from the SQL queries above.
No results have been overstated or adjusted to appear more
favourable than the data supports.*